Array based methods for synthesizing nucleic acid mixtures

ABSTRACT

Methods for generating mixtures of nucleic acids, e.g., oligonucleotide primers, are provided. In the subject methods, an array is employed as template to generate mixtures of nucleic acids via a template driven primer extension reaction. In preferred embodiments, each probe on the array employed in the subject methods comprises a constant domain and a variable domain, where the constant domain is further characterized by having at least a recognition domain. Also provided are the arrays employed in the subject methods and kits for practicing the subject methods. The subject methods find use in a variety of applications, including the generation of target nucleic acids from an mRNA sample for use in hybridization assays, e.g., differential gene expression analyses.

FIELD OF THE INVENTION

The field of this invention is molecular biology, and particularly geneexpression analysis.

BACKGROUND OF THE INVENTION

The characterization of cellular gene expression (i.e., gene expressionanalysis) finds application in a variety of disciplines, such as in theanalysis of differential expression between different tissue types,different stages of cellular growth or between normal and diseasedstates.

Fundamental to differential expression analysis is the detection ofdifferent mRNA species in a test sample, and often the quantitativedetermination of different mRNA levels in that test sample. In order todetect different mRNA levels in a given test population, a population oflabeled target nucleic acids that, at least partially, reflects ormirrors the mRNA profile of the test sample is produced. In other words,a population of labeled target nucleic acids is generated where at leasta portion of the mRNA species in the test sample are represented, interms of presence and often in terms of amount. Following targetgeneration, the target population is contacted with one or more probesequences, e.g., as found on an array, whereby the presence and oftenamount of specific targets in the target population is detected. Fromthe resultant data, information about the mRNAs present in the sample,i.e., the mRNA profile and gene expression profile, can be readilydeduced.

A fundamental step in gene expression analysis assays is, therefore, thestep of labeled target generation. Target generation protocols typicallyinclude a primer extension reaction, in which a primer is contacted withan initial mRNA sample to produce a labeled target population, asdescribed above. In certain protocols, polyA primers and variantsthereof are employed. Disadvantages of such protocols include theinability to produce target from prokaryotic mRNA species that lack apolyA tail and the propensity of such protocols to produce target thatlacks 5′ mRNA information. While the use of random primers overcomessome of these disadvantages, random primer protocols suffer from theirown disadvantages, e.g., lack of specificity resulting from increasedcomplexity in the primer mixture produced by the process, where not onlymRNA is represented, but also rRNA, tRNA and snRNA. In yet otherprotocols, custom primer mixes are employed in target generation. Whilesuch protocols overcome the above-described disadvantages with polyA andrandom primer based protocols, custom primer mix or gene specific primerbased protocols can be prohibitively expensive, particularly inarray-based hybridization protocols in which custom arrays are employed.

As such, there is continued interest in the development of new primergeneration protocols. Of particular interest would be the development ofa protocol that realizes the advantages of gene specific primer basedprotocols while at the same time is economical to perform and istherefore suitable for use in custom array-based hybridization assays.

Relevant Literature

See U.S. Pat. No. 5,795,714 and the references cited therein.

SUMMARY OF THE INVENTION

Methods for generating mixtures of nucleic acids, e.g., oligonucleotideprimers, are provided. In the subject methods, an array of probe nucleicacids is employed as template to generate mixtures of nucleic acids viaa template driven primer extension reaction. In preferred embodiments,each probe on the array employed in the subject methods comprises aconstant domain and a variable domain, where the constant domain isfurther characterized by having at least a recognition domain, andoptionally a functional domain and/or linker domain. Also provided arethe arrays employed in the subject methods and kits for practicing thesubject methods. The subject methods find use in a variety ofapplications, including the generation of target nucleic acids from anmRNA sample for use in hybridization assays, e.g., differential geneexpression analysis.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 provides a view of the stained gel produced in Example 1 of theExperimental section, infra.

DEFINITIONS

The term “nucleic acid” as used herein means a polymer composed ofnucleotides, e.g., deoxyribonucleotides or ribonucleotides, or compoundsproduced synthetically (e.g. PNA as described in U.S. Pat. No. 5,948,902and the references cited therein) which can hybridize with naturallyoccurring nucleic acids in a sequence specific manner analogous to thatof two naturally occurring nucleic acids.

The terms “ribonucleic acid” and “RNA” as used herein mean a polymercomposed of ribonucleotides.

The terms “deoxyribonucleic acid” and “DNA” as used herein mean apolymer composed of deoxyribonucleotides.

The term “oligonucleotide” as used herein denotes single strandednucleotide multimers of from about 10 to 100 nucleotides and up to 200nucleotides in length.

The term “polynucleotide” as used herein refers to single or doublestranded polymer composed of nucleotide monomers of generally greaterthan 100 nucleotides in length.

The term “mRNA” means messenger RNA.

The term “array” means a substrate having at least one planar surface onwhich is immobilized a plurality of different probe nucleic acids.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

Methods for generating mixtures of nucleic acids, e.g., oligonucleotideprimers, are provided. In the subject methods, an array is employed astemplate to generate mixtures of nucleic acids via a template drivenprimer extension reaction. In preferred embodiments, each probe on thearray employed in the subject methods comprises a constant domain and avariable domain, where the constant domain is further characterized byhaving at least a recognition domain, and optionally a functional and/orlinker domain. Also provided are the arrays employed in the subjectmethods and kits for practicing the subject methods. The subject methodsfind use in a variety of applications, including the generation oftarget nucleic acids from an mRNA sample for use in hybridizationassays, e.g., differential gene expression analysis. In furtherdescribing the subject invention, the subject methods will be describedfirst, followed by a review of representative protocols in which thenucleic acid mixtures produced by the subject methods find use as wellas a description of kits that find use in practicing the subjectmethods.

Before the subject invention is described further, it is to beunderstood that the invention is not limited to the particularembodiments of the invention described below, as variations of theparticular embodiments may be made and still fall within the scope ofthe appended claims. It is also to be understood that the terminologyemployed is for the purpose of describing particular embodiments, and isnot intended to be limiting. Instead, the scope of the present inventionwill be established by the appended claims.

In this specification and the appended claims, the singular forms “a,”“an” and “the” include plural reference unless the context clearlydictates otherwise. Unless defined otherwise, all technical andscientific terms used herein have the same meaning as commonlyunderstood to one of ordinary skill in the art to which this inventionbelongs.

Methods

As summarized above, the subject invention provides methods forgenerating mixtures of nucleic acids by a template driven primerextension protocol in which an array is employed as template. Themixture of nucleic acids produced by the subject methods ischaracterized by having a known composition. As such, at least thesequence of each individual or distinct nucleic acid in the mixture ofdiffering sequence is known. In many embodiments, the relative amount orcopy number of each distinct nucleic acid of differing sequence isknown. Each nucleic acid present in the mixture at least includes avariable domain that serves to distinguish it from any other nucleicacid in the mixture, i.e., any other nucleic acid that does not have theidentical sequence—any nucleic acid that is not its copy. The variabledomain, S_(ij), is a nucleic acid that hybridizes under stringentconditions to gene i at location j and is capable of serving as a primerin reverse transcription beginning at base j. The number of differentvariable domains, S_(ij), present in the mixture may vary, but isgenerally at least about 10, usually at least about 20 and more usuallyat least about 50, where the number may be as great as 25,000 orgreater. In many embodiments, the number of different variable domainspresent in the mixture ranges from about 1,978 to 25,000, usually fromabout 4,200 to 8,400. In addition to the distinguishing variable domain,the constituent members of the mixture may all share one or more domainsof common sequence, depending on the particular protocol employed togenerate the mixture, as described in greater detail below.

In the subject methods, the first step is generally to provide an array,i.e., a substrate having a planar surface on which is immobilized aplurality of distinct nucleic acid probes, in which each probe sequenceon the array includes a constant domain and a complement variabledomain. This providing step may include either generating the array denovo or obtaining a pre-made array from a commercial source, where ineither case the array will have the characteristics described below.Arrays of nucleic acids are known in the art, where representativearrays that may be modified to become arrays of the subject invention asdescribed below, include those described in: U.S. Pat. Nos. 5,242,974;5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327;5,445,934; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501;5,556,752; 5,561,071; 5,599,695; 5,624,711; 5,639,603; 5,658,734;5,795,714; WO 93/17126; WO 95/11995; WO 95/35505; EP 742 287; and EP 799897; the disclosures of which are herein incorporated by reference.

As mentioned above, each distinct probe nucleic acid on the arrayincludes a constant domain and a complement variable domain. Thecomplement variable domain of each distinct probe has a sequence that isthe complement of a variable or distinguishing domain found in aconstituent member of the mixture of nucleic acids that is produced bythe subject methods as described above, where by complement is meantthat the variable and complement variable sequences hybridize understringent conditions, e.g., at 50° C. or higher and 0.1×SSC (15 mMsodium chloride/1.5 mM sodium citrate) or thermodynamically equivalentconditions. Thus, the array includes a plurality of distinct probes thatdiffer from each other by complement variable domain, where the numberof distinct probes on an array employed in the subject methods istypically at least 10, usually at least 20 and more usually at least 50,where the number may be as high as 25,000 or higher. In manyembodiments, the number of distinct probes ranges from about 1,978 to25,000, usually from about 4,200 to 8,400.

Because of the nature of the subject methods, as described below, eachdistinct complement variable domain will be represented in the nucleicacid mixture produced using the array, i.e., the complement of eachdistinct complement variable domain sequence will be found in themixture of nucleic acids produced by the subject methods. For example,where an array has 10 different probes that differ by complementvariable domain such that it has 10 different complement variabledomains, i.e., cV₁₋₁₀, the nucleic acid mixture produced by the subjectmethods as described below will have 10 different or distinct nucleicacids, where each different nucleic acid sequence in the mixtureincludes a sequence that is the complement of one of cV₁₋₁₀, i.e.,V₁₋₁₀.

The relative copy number of each probe on the array may or may not beselected to “normalize” the nucleic acid mixture made with the arraywith respect to the mRNA sample with which it is to be used. Forexample, if the array is to be used to make a nucleic acid mixture thathas a 10-fold increase in the copy number of target that hybridizes to arare mRNA, the copy number of the corresponding (e.g. identical orcomplementary) probe on the array can be appropriately increasedrelative to other probes that correspond to less rare mRNA species inthe mRNA sample. In many embodiments, the complement variable domain isa domain that has a sequence that is chosen to hybridize under stringentconditions to a sequence of interest found in a particular mRNA. In manyembodiments, the complement variable sequence has a sequence that isdenoted as cS_(ij), where c stands for complement and S_(ij) is anucleic acid that primes reverse transcription of a gene i beginning atbase j. Thus, in many embodiments of the invention, the complementvariable domain of each probe is the complement of a nucleic acid thatis capable of hybridizing to a different gene of interest i at locationor base j and acting as a primer under reverse transcription conditions.For example, where 10 different genes, i.e., genes 1 to 10 arerepresented on the array and the sequence of interest for each genebegins at base number 50, 60, 70, 80, 90, 100, 110, 120, 130 and 140,respectively (counting from the 5′ end of the mRNA molecule), and eachcomplement variable domain is 20 bases long, the complement variabledomains of each distinct probe on the array, i.e., cV₁ to V₁₀, will beas follows: Variable Domain Sequence cV₁ Sequence that hybridizes understringent conditions to bases 50 to 30 of gene 1 cV₂ Sequence thathybridizes under stringent conditions to bases 60 to 40 of gene 2 cV₃Sequence that hybridizes under stringent conditions to bases 70 to 50 ofgene 3 cV₄ Sequence that hybridizes under stringent conditions to bases80 to 60 of gene 4 cV₅ Sequence that hybridizes under stringentconditions to bases 90 to 70 of gene 5 cV₆ Sequence that hybridizesunder stringent conditions to bases 100 to 80 of gene 6 cV₇ Sequencethat hybridizes under stringent conditions to bases 110 to 90 of gene 7cV₈ Sequence that hybridizes under stringent conditions to bases 120 to100 of gene 8 cV₉ Sequence that hybridizes under stringent conditions tobases 130 to 110 of gene 9 cV₁₀ Sequence that hybridizes under stringentconditions to bases 140 to 120 of gene 10

While the length of the complement variable domain in the specificexample provided above is 20 bases or residues, i.e., 20 nt, the lengthmay vary considerably and will be chosen based on the desired length ofthe resultant nucleic acids in the to be produced mixture within thesynthesis constraints of the subject method. Generally, the length ofthe complement variable domain will range from about 15 to 40, usuallyfrom about 15 to 30 and more usually from about 20 to 25 nt.

As mentioned above, in addition to the unique complement variabledomain, each probe nucleic acid present on the array includes a commonor shared constant domain 3′ of the complement variable domain. Thisconstant domain typically ranges in length from about 20 to 50, usuallyfrom about 20 to 45 and more usually from about 25 to 40 nt. Theconstant domain typically comprises at least one of the followingconstant sub-domains: a functional domain; a recognition domain and alinker domain. In many embodiments, each probe contains at least arecognition sub-domain, and optionally a functional domain and/or alinker domain. These constant sub-domains may be grouped together on theprobe or separated so as to flank the variable domain of the probe. Assuch, in certain embodiments these sub-domains are generally arranged inthe order of functional domain, recognition domain and linker domaingoing from the 5′ to the 3′ end of the probe sequence, such that thelinker domain is at the 3′ probe terminus and is attached, eitherdirectly or indirectly, to the substrate surface of the array. In yetother embodiments, one or more of the domains, e.g., the functionalsub-domain, may be present on the 5′ end of the variable domain.

The optional functional sub-domain is generally a sequence that impartsor contributes some function to a duplex nucleic acid in which it ispresent. Functional domains of interest include: polymerase promotersites, e.g., T3 or T7 RNA polymerase promoter sites, sequences uniquewith respect to the intended target organism for the array experiment(i.e. unique priming sites) and the like. The length of this functionaldomain typically ranges from about 10 nt to 40 nt, usually from about 20nt to 30 nt

The recognition sequence of the constant domain is typically a sequencethat, when present in duplex format, is recognized and cleaved by arestriction endonuclease. A large number of restriction endonucleasesare known to those of skill in the art. Specific restrictionendonuclease recognized sites of interest that may make up the subjectrecognition sequence include, but are not limited to: Hinc II and thelike. Generally, the length of the recognition domain ranges from about4 nt to 8 nt, usually from about 5 nt to 6 nt

The linker sub-domain of the subject constant domains is optional. Thelinker domain may be any convenient sequence, including random sequenceor a non-polynucleotide chemical linker (e.g. an ethylene glycol-basedpolyether oligomer), where the sole purpose of the linker domain is toproject the other domains of the probe away from the substrate surface.Generally, the linker domain if present, has a length ranging from about1 to 20, usually from about 1 to 15 and more usually from about 1 to 10,including 5 to 10 nt.

In many, though not all, embodiments, each surface bound probe on thearray employed in the subject methods is described by the followingformula:surface-3′-L-R—F-cV-5′wherein:

-   -   L is the optional linking domain;    -   R is the recognition domain;    -   F is the functional domain; and    -   cV is the complement variable domain, i.e., the complement of        the variable domain, cS_(ij), of the nucleic acid produced by        the subject methods to which it hybridizes under stringent        conditions;    -   where each of these elements are as described above.

As mentioned above, the subject arrays are provided by any convenientmeans, including obtaining them from a commercial source or bysynthesizing them de novo. To synthesize the arrays employed in thesubject methods, the first step is generally to determine the nature ofthe mixture of nucleic acids that is to be produced using the subjectarray according to the subject methods. In those embodiments where thenucleic acid mixture is to be employed as gene specific primer in thegeneration of target nucleic acid, as described in greater detail below,the first step is to identify those genes that are to be represented bya primer in the primer mixture, i.e., those specific mRNAs potentiallypresent in the experimental samples which are to have primers in themixture that are capable of hybridizing to them under stringentconditions. Following identification of these genes, the specificregion, i.e. stretch or domain, of each mRNA to which the primer is tohybridize is then identified. These specific domains or regions may beidentified using any convenient protocol and set of selection criteria,where of interest in many embodiments is the use of the algorithm andselection methods based thereon described in U.S. patent applicationSer. No. 09/021,701, the disclosure of which is herein incorporated byreference. As such, a plurality of different sequences of interest willbe identified, wherein each sequence is described by the formula S_(ij),where i is the gene of interest and j is the specific base at which thesequence starts, as described above. Following identification of eachvariable or S_(ij) sequence as described above, a probe sequence foreach different variable or S_(ij) sequence is identified, where theprobe sequence has the following sequence in many embodiments:3′-L-R—F-cV-5′wherein:

-   -   L is the linking domain;    -   R is the recognition domain;    -   F is the functional domain; and    -   cV is the complement of the variable domain, i.e., cS_(ij);    -   where each of these elements are as defined above and each of        the probes varies only in terms of its cV domain.

Following identification of the probe sequences as defined above, anarray is produced in which each of the probe sequences of the identifiedset is present. The array may be produced using any convenient protocol,where suitable protocols include both synthesis of the complement probefollowed by deposition onto a substrate surface, as well as synthesis ofthe probe directly on the substrate surface. Representative protocolsfor array synthesis are described in: U.S. Pat. Nos. 5,242,974;5,384,261; 5,405,783; 5,412,087; 5,424,186; 5,429,807; 5,436,327;5,445,934; 5,472,672; 5,527,681; 5,529,756; 5,545,531; 5,554,501;5,556,752; 5,561,071; 5,599,695; 5,624,711; 5,639,603; 5,658,734;5,795,714; WO 93/17126; WO 95/11995; WO 95/35505; EP 742 287; and EP 799897; the disclosures of which are herein incorporated by reference.

Following provision of the array employed in the subject methods, asdescribed above, the next step is to contact the array with universalprimer under hybridization conditions sufficient to produce a templatearray that includes a plurality of overhang comprising duplex nucleicacids on its surface, where the overhang is made up of the complementvariable domain of each probe of the array. The universal primer iscapable of hybridizing to the constant domain, or at least a portionthereof (e.g., at least that portion immediately 3′ of the complementvariable domain). The universal primer has a length that is sufficientto prime template driven primer extension, where the length of theuniversal primer generally ranges from about 10 to 45 nt, usually fromabout 15 to 35 nt and more usually from about 20 to 30 nt. In manyembodiments, the universal primer is the complement of the recognitionand/or functional sub-domains of the constant domain of each probe onthe array. As such, in many embodiments the universal primer employedhas a sequence described by the formula:5′-cR-cF-3′wherein:

-   -   cR is the complement of the recognition domain; and    -   cF is the complement of the functional domain.

As mentioned above, the template array produced by this method is anarray of duplex probe molecules made up of a first nucleic acid having aconstant and complement variable domain and a second nucleic acid whichis the universal primer and is hybridized to the constant domain (or atleast that portion of the constant domain that is 3′ of the variabledomain complement). As such, the array produced by this step is an arrayof overhang comprising duplex nucleic acid, typically DNA, molecules,where the overhang is made up of the complement variable domain of eachprobe on the array.

This template array of overhang comprising duplex probes is thensubjected to primer extension reaction conditions sufficient to producethe desired mixture of nucleic acids. The specific primer extensionreaction conditions to which the template array of overhang comprisingduplex nucleic acids is subjected may vary depending on the particularprotocol used and/or the specific nature of the nucleic acid mixture tobe produced therefrom. Specific primer extension reaction conditions ofinterest include, but are not limited to: linear PCR (Polymerase ChainReaction); strand displacement amplification; and in vitrotranscription. Each of these specific primer extension reactionconditions is now reviewed in greater detail.

Where the template array is subjected to linear PCR conditions, thearray is contacted in an aqueous reaction mixture with a source of DNApolymerase, dNTPs and any other desired or requisite primer extensionreagents under conditions sufficient to produce linearly amplifiedamounts of nucleic acids, e.g., under thermal cycling conditions. Assuch, the polymerase employed in the subject methods is generally,though not necessarily (e.g., where new polymerase is added after eachcycle) a thermostable polymerase. A variety of thermostable polymerasesare known to those of skill in the art, where representative polymerasesinclude, but are not limited to: Taq polymerase, Vent® polymerase, Pfupolymerase and the like. The amount of polymerase present in thereaction mixture may vary but is sufficient to provide for the requisiteamount of polymerase activity, where the specific amount employed may bereadily determined by those of skill in the art. Also present in thereaction mixture is a collection of the four dNTPs, i.e., dATP, dCTP,dGTP and dTTP. The dNTPs may be present in varying or equimolar amounts,where the amount of each dNTP typically ranges from about 10 μM to 10mM, usually from about 100 μM to 300 μM. Other reagents that may bepresent in the reaction mixture include: monovalent cations (e.g. Na⁺),divalent cations (e.g. Mg⁺⁺), buffers (e.g. Tris), surfactants (e.g.Triton X-100) and the like. In this linear PCR embodiment of the subjectmethods, the reaction mixture is subjected to thermal cycling conditionsin which the temperature of the reaction mixture is cycled through anannealing, primer extension and dissociation temperatures in a mannerthat results in the production of linearly amplified amounts of nucleicacid for each different sequence probe on the template array. Theannealing temperature typically ranges from about 50° C. to 80° C.,usually from about 60° C. to 75° C. and is maintained for period of timeranging from about 10 sec. to 10 min., usually from about 30 sec. to 2min. The primer extension temperature typically ranges from about 55° C.to 75° C., usually from about 60° C. to 70° C. and is maintained forperiod of time ranging from about 30 sec. to 10 min., usually from about1 min. to 5 min. The dissociation temperature typically ranges fromabout 80° C. to 99° C., usually from about 90° C. to 95° C. and ismaintained for period of time ranging from about 1 sec. to 2 min.,usually from about 30 sec. to 1 min.

In strand displacement amplification, the array of overhang comprisingduplex nucleic acids is employed as primed template in linearamplification variations of the exponential amplification protocolsdescribed in Walker et al., Nucleic Acids Res. (1992) 20:1691-1696 andWalker et al., Proc. Nat'l Acad. Sci. USA (1992) 89:392-396; as well asin U.S. Pat. No. 5,648,211; the disclosure of which is hereinincorporated by reference. Briefly, isothermal linear amplification isachieved as follows. Following production of the array of overhangcomprising duplex nucleic acids, the template array is subjected to acycle of strand nicking of the universal primer after sequence cR,typically by using a restriction endonuclease. Generally, the templatestrand or probe sequence is protected via an appropriately placedphosphorthioate linkage in the surface-bound template strand. Extensionof the 3′ end exposed by the nick is then allowed to proceed by using aDNA polymerase that lacks a 5′→3′ exonuclease activity but possesses astrand displacement activity, e.g., Klenow fragment. Each cycle in thisprotocol releases a nucleic acid molecule which has the formula:5′-cF-Sij-3′. In certain variants of this method, nicking may beachieved by making R a half-site for a restriction endonuclease thatexhibits single-strand cleavage activity, or by employing a nickingendonuclease, such as N.BstNBI, and the like.

In yet other embodiments, the subject template array of duplex nucleicacids is employed in an in vitro transcription method. In thisembodiment, the template array is modified from that described above tobe of the following formula:(surface)-L-R-(C)Sij-F-5′wherein:

-   -   L and R are as defined above;    -   F is an RNA polymerase promoter, e.g., T3 or T7 promoter; and    -   (C) Sij is Sij modified to end in a C residue.

The universal primer employed with this array has the formula 5′-cR-3′.When the template array is contacted with NTPs, T3 or T7 polymerase andthe appropriate transcription buffer, rinonucleic acids of the formula5′-(rG)rcSij-rcF-3′ are produced, where r stands for ribonucleotide. Bycontacting this resultant mixture of ribonucleic acids with the DNAprimer 5′-F-3′ and a reverse transcriptase, a mixture ofdeoxyribonucleic acids suitable for use as primer in target generationprotocols is produced.

The subject template arrays may also be used in other nucleic acidprimer extension generation protocols—the above being merelyrepresentative of the protocols in which the subject template arraysfind use.

The above described array template based primer extension generationmethods result in the production of a mixture of nucleic acids,typically a mixture of deoxyribonucleic acids, where each of thedifferent complement variable domains of the template array isrepresented in the mixture, i.e., there is at least one nucleic acid inthe mixture that has a variable domain that hybridizes under stringentconditions to each different complement variable domain present on thearray. The length of each of the nucleic acids present in the resultantmixture typically ranges from about 20 to 60 nt, usually from about 25to 55 nt and more usually from about 30 to 50 nt. Because of the mannerin which the subject mixtures of nucleic acids are produced, theresultant mixtures of nucleic acids may be viewed as mixtures of genespecific primers, where the gene specific primers are specific for eachof the different genes represented on the template array employed in theproduction of the nucleic acid mixture. In certain embodiments, themixture may be “normalized” with respect to a given mRNA population, asdescribed above.

Utility

The nucleic acid mixtures produced by the subject methods find use in avariety of different applications, and are particularly suited for useas primers in the generation of target nucleic acids, e.g., for arraybased differential gene expression analysis applications. Where thesubject nucleic acids mixtures are used as primers for target generationin gene expression analyses, the first step is to generate a populationof target nucleic acids from an initial mRNA source or sample. By targetnucleic acid is meant a nucleic acid that has a sequence, e.g., S_(ij),which is either the same as, or complementary to, the sequence of anmRNA found in an initial sample, where the target may be DNA or RNA andbe present in amplified amounts as compared to the initial amount ofmRNA, depending on the particular target generation protocol that isemployed.

In the subject methods, the target or image nucleic acids are producedfrom the subject nucleic acid mixtures generally through enzymaticgeneration protocols. Specifically, the target nucleic acids aretypically produced using template dependent polymerization protocols andan initial mRNA source. The initial mRNA source may be present in avariety of different samples, where the sample will typically be derivedfrom a physiological source. The physiological source may be derivedfrom a variety of eukaryotic or prokaryotic sources, with physiologicalsources of interest including sources derived from single-celledorganisms such as yeast and multicellular organisms, including plantsand animals, particularly mammals, where the physiological sources frommulticellular organisms may be derived from particular organs or tissuesof the multicellular organism, or from isolated cells derived therefrom.In obtaining the sample of RNA to be analyzed from the physiologicalsource from which it is derived, the physiological source may besubjected to a number of different processing steps, where suchprocessing steps might include tissue homogenization, cell isolation andcytoplasm extraction, nucleic acid extraction and the like, where suchprocessing steps are known to those of skill in the art. Methods ofisolating RNA from cells, tissues, organs or whole organisms are knownto those of skill in the art and are described in Maniatis et al.(1989), Molecular Cloning: A Laboratory Manual 2d Ed. (Cold SpringHarbor Press).

A number of different enzymatic protocols for generating image or targetnucleic acids from an initial mRNA sample are known and continue to bedeveloped. Any convenient protocol may be employed, where the particularprotocol employed depends, at least in part, on a number of factors,including: whether one wants to generate amplified amounts of target orimage nucleic acid; whether one wants to generate geometrically orlinearly amplified amounts of target nucleic acid; whether bias in theamount of target can be tolerated, etc. A common feature of theprotocols that find use in preparing the image or target nucleic acidsof the subject invention is the use of the subject nucleic acid mixturesproduced using array-based template protocols described above as primer.

A number of nucleic acid amplification methods can be employed togenerate the target nucleic acid from an initial mRNA source, wherethese methods can employ the subject nucleic acid mixtures as primer.Such methods include the “polymerase chain reaction” (PCR) as describedin U.S. Pat. No. 4,683,195, the disclosure of which is hereinincorporated by reference, and a number of transcription-basedexponential amplification methods, such as those described in U.S. Pat.Nos. 5,130,238; 5,399,491; and 5,437,990; the disclosures of which areherein incorporated by reference. Each of these methods usesprimer-dependent nucleic acid synthesis to generate a DNA or RNAproduct, which serves as a template for subsequent rounds ofprimer-dependent nucleic acid synthesis. Each process uses (at least)two primer sequences complementary to different strands of a desirednucleic acid sequence and results in an exponential increase in thenumber of copies of the target sequence.

Alternatively, amplification methods that utilize a single primer may beemployed to generate target or image nucleic acids from an initial mRNAsample, where the subject nucleic acid mixtures are employed as primer.See e.g. U.S. Pat. Nos. 5,554,516; and 5,716,785; the disclosures ofwhich are herein incorporated by reference. The methods reported inthese patents utilize a single primer containing an RNA polymerasepromoter sequence and a sequence complementary to the 3′-end of thedesired nucleic acid target sequence(s) (“promoter-primer”). In bothmethods, the promoter-primer is added under conditions where ithybridizes to the target sequence(s) and is converted to a substrate forRNA polymerase. In both methods, the substrate intermediate isrecognized by RNA polymerase, which produces multiple copies of RNAcomplementary to the target sequence(s) (“cRNA”).

Whatever process is employed to generate the target nucleic acid, whererepresentative protocols have been provided immediately above, theprocess may be modified to include the use of chemical analogs ofnucleotides that have been modified to include a label moiety, e.g., anorganic fluorophore, an isotopic label, a capture ligand, e.g., biotin,etc. As a result, the target nucleic acids produced using the subjectnucleic acid mixtures as primers often are labeled, either directly orindirectly, for use in subsequent hybridization assays.

The above target generation protocols are merely representative and byno means inclusive of all of the different types of protocols in whichthe subject nucleic acid mixtures find use as primers.

The resultant populations of target nucleic acids find use as, interalia, target in hybridization assays, such as gene expression analysisapplications. Gene expression analysis protocols are well known to thoseof skill in the art, and the populations of target nucleic acidsproduced by the subject methods find use in many, if not all, of theseprotocols. In gene expression analysis protocols using the subjectpopulations of labeled target, the population of labeled target istypically contacted with a population of probe nucleic acids, e.g., onan array, under hybridization conditions, usually stringenthybridization conditions. The array may be the same array that is usedas the template array or a different array. Following hybridization,non-bound target is removed or separated from the probe, e.g., bywashing. Washing results in a pattern of hybridized target, which may beread using any convenient protocol, e.g., with a fluorescent scannerdevice. From this pattern, information regarding the mRNA expressionprofile in the initial mRNA sample from which the target population wasproduced may be readily derived or deduced.

In certain embodiments, the subject methods include a step oftransmitting data from at least one of the detecting and deriving steps,as described above, to a remote location. By “remote location” is meanta location other than the location at the which the array is present andhybridization occur. For example, a remote location could be anotherlocation (e.g. office, lab, etc.) in the same city, another location ina different city, another location in a different state, anotherlocation in a different country, etc. The data may be transmitted to theremote location for further evaluation and/or use. Any convenienttelecommunications means may be employed for transmitting the data,e.g., facsimile, modem, internet, etc.

Kits

Also provided by the subject invention are kits for use in preparing thesubject target populations of nucleic acids. The kits may comprisecontainers, each with one or more of the various reagents (typically inconcentrated form) utilized in the methods, including, for example,buffers, dNTPs, reverse transcriptase, etc., where the kits will atleast include a sufficient amount of universal primer, e.g., an amountranging from about 25 pmol to 25 μmol. In addition, the subject kits mayinclude an array of single stranded probe nucleic acids (or a means forproducing the same) wherein each probe has a constant region andcomplement variable region, as described above. Where the kit has ameans for producing the template array, the kit typically includes asubstrate having a planar surface, and one or more reagents necessaryfor synthesis of the probes, which may vary depending on the nature ofthe protocol to be used to generate the array. The kits may furtherinclude reagents necessary for producing labeled target nucleic acids,where such reagents may include reverse transcriptase, labeled dNTPs,etc. A set of instructions will also typically be included, where theinstructions may be associated with a package insert and/or thepackaging of the kit or the components thereof.

The following examples are offered by way of illustration and not by wayof limitation.

Experimental EXAMPLE

In order to demonstrate the feasibility of using an oligonucleotidearray as a template for enzymatic polynucleotide synthesis, thefollowing experiment was performed:

1. An in situ oligonucleotide array was manufactured; the arraycontained 8455 (89×95) features (˜100 μm diameter) with the followingsequence: (SEQ ID NO:01) 5′-CTTTCTTGGATCAACCCGCTCAATGCTCCCTATAGTGAGTCGTATTACAATTCATTTTTT-surface

In the above sequence, the large dash underlines indicate the uniquesequence cS_(ij), the small dashes indicate the recognition/functionalsequence F-R (in this case, a T7 RNA polymerase promoter) and thecontinuous underline indicates a linker sequence Q.

2. The array was hybridized for 1 hour at 60° C. to the followingoligonucleotide (PT7, 250 nM) 3′-GATATCACTCAGCATAATGTTAAGTA-5′ (SEQ IDNO:02)i.e. the complementary strand of the T7 promoter portion of theoligonucleotide on the surface. The purpose of this treatment was toproduce a double-stranded T7 promoter, which is necessary for T7 RNApolymerase activity (note that a double-stranded template strand is notnecessary; a 5′-overhanging single-stranded template is known to besufficient).

3. The array was washed briefly with ice-cold water (to remove saltsfrom the hybridization buffer) and blown dry with nitrogen. Thehybridization chamber was reassembled and filled with a transcriptionmixture (250 μl) containing T7 transcription buffer (including NTP's),T7 RNA polymerase, 1% Triton X-100 and the oligonucleotide of step 2(250 nM). The assembly was incubated overnight at 40° C. An identicalpositive control array was also incubated in contact with the sametranscription mixture, with a soluble version of the array-boundoligonucleotide of step 1 added (HCV185; 250 nM). Finally, a secondpositive control mixture was incubated in a PCR tube.

4. The transcription mixtures were removed from the experimental andpositive control arrays. Half of each array mixture wasconcentrated >10× using a Microcon-3 ultrafiltration concentrator.

5. The various samples were analyzed on a 15% polyacrylamide/4M ureagel, stained with ethidium bromide and visualized by fluorescence. Theresults are provided in FIG. 1.

The results provided in FIG. I clearly show visible transcript in theconcentrated experimental array sample (lane 2). Separate negativecontrol experiments demonstrated that reactions which omitted thecomplementary oligonucleotide PT7 or the T7 RNA polymerase did notproduce visible bands on a similar gel (data not shown). Microconconcentration of ˜80 μl of 250 nM PT7 oligo also failed to yield avisible band on a similar gel (data not shown). Thus, the observed gelpattern is dependent upon the presence of T7 RNA polymerase and adouble-stranded T7 promoter, and is not due to the added oligonucleotidePT7. Furthermore, the chief product of transcription from an array-boundtemplate displays the same gel migration rate as the chief product ofpositive-control transcription reactions. The most likely explanationfor the observed data is that we have reduced to practice the T7 RNApolymerase version of enzymatic oligonucleotide production from an arraytemplate.

It is evident that the subject invention provides a number of advantagesover current target nucleic acid generation protocols. These advantagesinclude the provision of an economical and rapid synthesis method forcustom primer mixtures that are particularly suited for use in targetgeneration for use with the nucleic acid arrays. Using the subjectmethods leads to increased specificity in microarray based assays. Usingthe subject methods, one can develop microarray based assays in whichthe microarray is customized to be sensitive or insensitive to varioussplicing variants of different genes of interest, even where thesplicing variant is present proximal to the 5′ end of the codingsequence. Allele specific mRNA profiling is possible with the subjectmethods by picking the variable region so that the 3′-end of the primerproduced hybridizes at a base where the two alleles differ. In addition,the subject methods can be employed to easily produce normalized targetnucleic acid mixtures. Accordingly, the invention represents asignificant contribution to the art.

All publications and patent applications cited in this specification areherein incorporated by reference as if each individual publication orpatent application were specifically and individually indicated to beincorporated by reference. The citation of any publication is for itsdisclosure prior to the filing date and should not be construed as anadmission that the present invention is not entitled to antedate suchpublication by virtue of prior invention.

Although the foregoing invention has been described in some detail byway of illustration and example for purposes of clarity ofunderstanding, it is readily apparent to those of ordinary skill in theart in light of the teachings of this invention that certain changes andmodifications may be made thereto without departing from the scope ofthe appended claims.

1-21. (canceled)
 22. A method for producing a mixture of nucleic acids,said method comprising: (a) providing an array of distinctsingle-stranded probe nucleic acids of differing sequence immobilized ona surface of a planar substrate where each distinct probe present onsaid array comprises a constant domain and a complement variable domain;wherein said complement variable domain is at the 5′ end of said eachdistinct probe; (b) hybridizing nucleic acids complementary to saidconstant domain with said array of single-stranded probe nucleic acidsto produce a template array of overhang comprising duplex nucleic acids,wherein each overhang comprising duplex nucleic acid of said arraycomprises a double-stranded constant region and a single-strandedvariable region overhang; (c) subjecting said template array of overhangcomprising duplex nucleic acids to a cyclic reaction or an in vitrotranscription protocol to produce a mixture of single stranded nucleicacids of differing sequence; and (d) separating said mixture of nucleicacids from said template array.
 23. The method according to claim 22,wherein said mixture of nucleic acids is a mixture ofdeoxyribo-oligonucleotides.
 24. The method according to claim 22,wherein said step (c) comprises a cyclic reaction.
 25. The methodaccording to claim 24, wherein said cyclic reaction comprises a protocolselected from the group consisting of: linear PCR and stranddisplacement amplification.
 26. The method according to claim 22,wherein said constant domain comprises at least one domain selected fromthe group consisting of: a linker domain; a functional domain and arecognition domain.
 27. The method according to claim 22, wherein saidstep (c) comprises an in vitro transcription protocol.
 28. The methodaccording to claim 27, wherein said constant domain comprises at leastone domain selected from the group consisting of: a linker domain; afunctional domain and a recognition domain.
 29. The method according toclaim 28, wherein said functional domain is an RNA polymerase promoterdomain.
 30. The method according to claim 22, wherein said array isdescribed by the formula:surface-L-R—F-cV-5′ wherein: L is an optional linking domain; R is arecognition domain; F is a functional domain; and cV is said complementdomain.
 31. The method according to claim 30, wherein said hybridizingstep (b) comprises contacting said array with a population of nucleicacids of the formula:5′-cR-cF-3′ wherein: cR is the complement of R; and cF is the complementof F.
 32. The method according to claim 31, wherein said template arrayof overhang comprising duplex nucleic acids is described by the formula:


33. The method according to claim 32, wherein each distinct constituentmember of said mixture produced by said method comprises a differentvariable domain V.
 34. The method according to claim 30, wherein saidrecognition domain is recognized by a restriction endonuclease.
 35. Themethod according to claim 22, wherein said array comprises at leastabout 50 different single-stranded probe nucleic acids of differingsequence.
 36. The method according to claim 35, wherein said mixture ofnucleic acids produced by said method comprises at least about 50nucleic acids of differing sequence.
 37. The method according to claim36, wherein each constituent member of said mixture ranges in lengthfrom about 20 to 60 nt.
 38. A method according to claim 22, wherein saidmethod further comprises employing said mixture of nucleic acids asprimers in a target generation step in which target nucleic acids areproduced from an mRNA sample to produce a population of target nucleicacids.
 39. The method according to claim 38, wherein said targetgeneration step (b) comprises a template driven primer extensionreaction.
 40. The method according to claim 38, wherein said targetgeneration step (b) produces labeled target nucleic acids.
 41. Themethod according to claim 38, wherein said method further comprisescontacting said set of target nucleic acids with an array of probenucleic acids under hybridization conditions and detecting the presenceof target nucleic acids hybridized to probe nucleic acids of said array.42. The method according to claim 41, wherein said target nucleic acidsare labeled.
 43. The method according to claim 41, wherein said methodfurther comprises washing unbound target away from the surface of saidarray.